Lab 2 - High-Frequency Models

Microstructures and Trading Systems



Ricardo Figueroa Acosta, facosta@iteso.mx

06.2022 | Repository: Link


High-Frequency Trading Models

Asset Pricing Theory (APT-Martingale) - Roll Model


Abstract

High-Frequency trading models are an interesting proposal in quantic finance, since its a different approach from what traditional trading and finance means; in this project we are going to dive deep into two of the most know High-Frequency trading models, from the Asset Pricing Theory: Martingale; which states that, from a market microstructure perspective, a price in the future will be exactly the same as the present price; i.e: $p_{t+1} = p_t.$

Keeping the same market microstructure perspective, we have the Roll model (1984) which is simple but useful model and this model what wants to demonstrate is that calculating the variance and the autocovariance we can model the bid-ask spread and this can be measured by:
$Spread = 2\sqrt{-cov}$ </font>

1. Introduction


In this project we are going to dive deep into two of the most known High-Frequency Trading Models; the Asset Pricing Theory which statets that the price of a risky asset, $P_t$, from a consuming model in present and future time, under the perspective of the market microstructure, can be modeled as a stochastic process named martingale.

A martingale is a sequence of random variables (i.e., a stochastic process) for which, at a particular time (in seconds or milliseconds), in this case, from the perspective of the market microstructure, the conditional expectation of the next value in the sequence is equal to the present value, regardless of all prior values i.e.: $p_{t+1} = p_t$.

This is going to be proved using an orderbook containing about 2400 timestamps. We are going to divide and group by minutes and count occurrences for martingales and non martingales.

The second model is the Roll Model (1984) that as I said the reason of being of this model it's to generate a theorical Spread but before this is useful to keep in mind that this model equire two major assumptions:

1) The asset is traded in an informationally efficient marke

2) The probability distribution of observed price changes is stationary (at least for short intervals of, say, two months)

Given this information it is just about making some basic statistic calculous and we'll see if this model can give us a correct aproximation of the spread.

2. Install/Load Packages and Depedencies


2.1 Python Packages

In order to run this notebook, it is necessary to have installed and/or have the requirements.txt file with the following:

  • numpy>=1.2
  • pandas>=1.4
  • plotly>=5.8

2.2 Files Dependencies

  • data.py: ordered data.
  • functions.py: functions used to solve main problems and perform experiments.
  • main.py: main script to call functions, visualizations and gather results.
  • visualizations.py: functions used to plot results.
  • requirements.txt: all libraries used and necessary to run this project.
  • orderbooks_05jul21.json: Orderbook containing raw data used in the project.

2.3 Libraries and scripts

In [ ]:
import plotly.io as pio
pio.renderers.default='notebook'

import main as mn
import data as dt
import functions as fn
import visualizations as vz

3. Data Description


orderbooks_05jul21.json:

Is an dataset that contains about 2400 orderbooks from bitfinex broker, split by dictionaries, which its primary key is a different timestamp. Every timestamp contains the following:

  • 'ask_size': Volume
  • 'ask': Price
  • 'bid': Price
  • 'bid_size': Volume

4. First experiment - APT


In [ ]:
help(fn.experiments)
Help on function experiments in module functions:

experiments(ob_data: dict, ob_ts: list, method: str) -> pandas.core.frame.DataFrame
    Function used to perform experiments with orderbook data.
    
    arguments:
    ----------
    ob_data: dictionary
    dictionary type with the following structure:
    'timestamp'
    'bid_size'
    'bid'
    'ask'
    'ask_size'
    
    ob_ts: list
    list with timestamps in string format.
    
    method: str: 'midprice' or 'wmidprice'
    string with the method that's going to be used in calculations.
    
    Returns -> dataframe 
    
    References:
    ----------
    [1] Martingale. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Martingale&oldid=49256

4.1 Experiment with midprice

In [ ]:
e1 = mn.e1
In [ ]:
e1[:3]
Out[ ]:
intervalo total e1 e1_proportion e2 e2_proportion
0 0 41 27 0.6585 14 0.3415
1 1 40 32 0.8000 8 0.2000
2 2 39 27 0.6923 12 0.3077
In [ ]:
e1.tail(3)
Out[ ]:
intervalo total e1 e1_proportion e2 e2_proportion
57 57 41 29 0.7073 12 0.2927
58 58 40 27 0.6750 13 0.3250
59 59 39 31 0.7949 8 0.2051

4.2 Demonstration

In order to prove if the midprices or weighted midprices follow a stochastic process, a martingale, it is necessary to prove the following statement: $p_{t+1} = p_t$.

Where

$p_{t+1} = \text{future price}$

and

$p_{t} = \text{present price}$

It is easy to prove by using a list comprehension

True_False_list = [midprice[i+1] == midprice[i] for i in range(len(midprice)-1)]

In [ ]:
print('The average proportion of martingale is', mn.e11_mean, 'and for no-martingales:', mn.e12_mean)
The average proportion of martingale is 0.73 and for no-martingales: 0.27

4.2 Experiment with weighted midprice

In [ ]:
e2 = mn.e2
In [ ]:
e2[:3]
Out[ ]:
intervalo total e1 e1_proportion e2 e2_proportion
0 0 41 27 0.6585 14 0.3415
1 1 40 27 0.6750 13 0.3250
2 2 39 26 0.6667 13 0.3333
In [ ]:
e2.tail(3)
Out[ ]:
intervalo total e1 e1_proportion e2 e2_proportion
57 57 41 27 0.6585 14 0.3415
58 58 40 27 0.6750 13 0.3250
59 59 39 26 0.6667 13 0.3333
In [ ]:
print('The average proportion of martingale is', mn.e21_mean, 'and for no-martingales:', mn.e22_mean)
The average proportion of martingale is 0.68 and for no-martingales: 0.32

The proportions change within the midprice and weighted midprice, but the martingale proportions is sill the highest one

5. Second experiment - Roll Model

5.1 Final Parameter #1

To get the first parameter was necessary to first of anything get the differencess of the prices, and keep them in a list, therefore we get with the numpy library the variance of these values

In [ ]:
print('the first parameter of the model is the variance, which is:',fn.roll_model(dt.data)['Final_Parameters'][0])
the first parameter of the model is the variance, which is: 8.42555592321929

5.2 Final Parameter #2

To get the second value instead of obtaining the variance, the calculous that I have to follow is the autocovariance of order one

In [ ]:
print('the second parameter of the model is the variance, which is:',fn.roll_model(dt.data)['Final_Parameters'][1])
the second parameter of the model is the variance, which is: -0.0012275561298996467

5.3 Main Parameters:

$Var(\Delta P_t) = 2C^{2} + \sigma_u^{2}$

$Cov(\Delta P_{t-t} \Delta P_t) = -C^{2}$

6. Results


6.1 Results of first experiment

In [ ]:
help(vz.exp1_plot)
Help on function exp1_plot in module visualizations:

exp1_plot(df: pandas.core.frame.DataFrame, x: str, y: str) -> 'stackedbarplot'
    Function used to plot a stacked bar for experiment 1
    
    arguments:
    ----------
    df: DataFrame
    DataFrame containing results from a martingale analysis
    
    x: str
    x-axis (0-59 minutes)
    
    y: str
    y-axis (martingales and non martingales)
    
    Returns -> stacked barplot

6.1.1 Experiment with midprice

In [ ]:
e1_plot = vz.exp1_plot(df = e1, x = 'intervalo', y = ['e1', 'e2'])

The result is consistent in the whole hour, but less consistent than the weighted midprice result

6.1.2 Experiment with weighted midprice

In [ ]:
e2_plot = vz.exp1_plot(df = e2, x = 'intervalo', y = ['e1', 'e2'])

The result is consistent in the whole hour.

By observing the two results, we can conclude than in a microstructure market perspective, about 70% of the data will follow an stochastic process: a martingale.

6.2 Results of second experiment

6.2.1 Observed Values

In [ ]:
vz.plot_roll(dt.data, vz.df_roll, 'observed', True)

As we can see, this is a time series which is conformated by de midprice and the observed ask and bid, and there is a minimun spread between each line

6.1.2 Experiment with weighted midprice

In [ ]:
vz.plot_roll(dt.data, vz.df_roll, 'theorical', True)

The difference between this plot and the previous one is that we can barely see the difference between the time series, and this is because we got a super little value fot the theorical Spread and what it makes is that the space between lines is gonna be that little.

7. Conclusion

APT

The APT model is kind of abstract, it has many concepts and may seem kind of advanced and difficult to understand, but when diving a little more deep into its foundations and focusing on them, it is easy to understang its porpuse or what it explicit means: The best that we can expect in the future its what already is happening in the present; I think it has much more sense from the market microstructure perspective, i.e. when time intervals are very small. As discovered in this project, about 70% a price, in the market microstructure perspective, will be a martingale.

Roll

This model does not need too much metrics or calculous but wha was diffcult was the part of identifying the main point, and we were able to detect that the model has too much differences while comparing with the observed values, I can say that the model didnt was correct and there may be done lots of implementations and corrections if we want the model to be enhanced.

8. References


[1] Munnoz, 2020. Python project template. https://github.com/iffranciscome/python-project. (2021).

[2] Martingale. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Martingale&oldid=49256